29 research outputs found

    On Characterizing the Data Access Complexity of Programs

    Full text link
    Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size

    A multi-center, real-life experience on liquid biopsy practice for EGFR testing in non-small cell lung cancer (NSCLC) patients

    Get PDF
    Background: circulating tumor DNA (ctDNA) is a source of tumor genetic material for EGFR testing in NSCLC. Real-word data about liquid biopsy (LB) clinical practice are lacking. The aim of the study was to describe the LB practice for EGFR detection in North Eastern Italy. Methods: we conducted a multi-regional survey on ctDNA testing practices in lung cancer patients. Results: Median time from blood collection to plasma separation was 50 min (20\u2013120 min), median time from plasma extraction to ctDNA analysis was 24 h (30 min\u20135 days) and median turnaround time was 24 h (6 h\u20135 days). Four hundred and seventy five patients and 654 samples were tested. One hundred and ninety-two patients were tested at diagnosis, with 16% EGFR mutation rate. Among the 283 patients tested at disease progression, 35% were T790M+. Main differences in LB results between 2017 and 2018 were the number of LBs performed for each patient at disease progression (2.88 vs. 1.2, respectively) and the percentage of T790M+ patients (61% vs. 26%)

    Equivalence classes and conditional hardness in massively parallel computations

    Get PDF
    The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle versus two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., P≠ NP), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems. In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by MPC(o(log N)) , and the standard space complexity classes L and NL, and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model. Specifically, our main results are as follows.Lower bounds conditioned on the one cycle versus two cycles conjecture can be instead argued under the L⊈ MPC(o(log N)) conjecture: these two assumptions are equivalent, and refuting either of them would lead to o(log N) -round MPC algorithms for a large number of challenging problems, including list ranking, minimum cut, and planarity testing. In fact, we show that these problems and many others require asymptotically the same number of rounds as the seemingly much easier problem of distinguishing between a graph being one cycle or two cycles.Many lower bounds previously argued under the one cycle versus two cycles conjecture can be argued under an even more robust (thus harder to refute) conjecture, namely NL⊈ MPC(o(log N)). Refuting this conjecture would lead to o(log N) -round MPC algorithms for an even larger set of problems, including all-pairs shortest paths, betweenness centrality, and all aforementioned ones. Lower bounds under this conjecture hold for problems such as perfect matching and network flow

    On the distributed complexity of large-scale graph computations

    No full text
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k 65 2 machines jointly perform computations on graphs with n nodes (typically, n 6b k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a \u201ccookbook\u201d fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration. These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k, improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: \u2022 PageRank: We show a lower bound of \u3a9 (n/k2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in 5 (n/k2 ) rounds. \u2022 Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires \u3a9 (m/k5/3 ) rounds. This result also implies the first non-trivial lower bound of \u3a9 (n1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in 5 (m/k5/3 + n/k4/3 ) rounds

    Tight bounds for parallel paging and green paging

    No full text
    In the parallel paging problem, there are p processors that share a cache of size k. The goal is to partition the cache among the processors over time in order to minimize their average completion time. For this long-standing open problem, we give tight upper and lower bounds of \u398(log p) on the competitive ratio with O(1) resource augmentation. A key idea in both our algorithms and lower bounds is to relate the problem of parallel paging to the seemingly unrelated problem of green paging. In green paging, there is an energy-optimized processor that can temporarily turn off one or more of its cache banks (thereby reducing power consumption), so that the cache size varies between a maximum size k and a minimum size k/p. The goal is to minimize the total energy consumed by the computation, which is proportional to the integral of the cache size over time. We show that any efficient solution to green paging can be converted into an efficient solution to parallel paging, and that any lower bound for green paging can be converted into a lower bound for parallel paging, in both cases in a black-box fashion. We then show that, with O(1) resource augmentation, the optimal competitive ratio for deterministic online green paging is \u398(log p), which, in turn, implies the same bounds for deterministic online parallel paging
    corecore